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The  Projected  Newton  Method  Has  Order  1  +  \/2  for  the 
Symmetric  Eigenvalue  Problem 

R.A.  Tapia*  David  L.  Whitley* 


Abstract 

In  their  study  of  the  classical  inverse  iteration  algorithm,  Peters 
and  Wilkinson  considered  the  closely  related  algorithm  that  consists 
of  applying  Newton’s  method,  followed  by  a  2-norm  normalization, 
to  the  nonlinear  system  of  equations  consisting  of  the  eigenvalue- 
eigenvector  equation  and  an  equation  requiring  the  eigenvector  to  have 
the  square  of  its  2-norm  equal  to  one.  They  argue  that  in  practice 
the  oo-norm  is  easier  to  work  with,  and  they  therefore  replace  the 
2-norm  normalization  equation  with  a  linear  equation  requiring  that 
a  particular  component  of  the  eigenvector  be  equal  to  one  (effectively 
an  oo-norm  normalization).  Next,  they  observe  that,  because  of  the 
linearity  of  the  normalization  equation,  the  normalization  step  is  au¬ 
tomatically  satisfied;  the  algorithm  thus  reduces  to  Newton’s  method 
and  quadratic  convergence  follows  from  standard  theory.  Peters  and 
Wilkinson  choose  to  dismiss  the  2-norm  formulation  in  favor  of  the  oo- 
norm  formulation;  one  factor  in  their  choice  seems  to  be  that  quadratic 
convergence  is  not  so  immediate  for  the  2-norm  formulation.  In  this 
work  we  establish  the  surprising  result  that  the  2-norm  formulation 
gives  a  convergence  rate  of  1  +  \/2,  significantly  superior  to  that  given 
by  the  Peters  and  Wilkinson  formulation. 

‘Mathematical  Sciences  Department,  Rice  University,  Houston,  Texas  77251-1892.  Re¬ 
search  sponsored  by  DOE  DE-FG05-86ER25017,  SDIO/IST  managed  under  ARO  DAAG- 
03-86- K-01 13,  and  AFOSR  85-0243. 
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1  Introduction 


In  their  study  of  the  classical  inverse  iteration  algorithm,  Peters  and  Wilkin¬ 
son  (1979)  considered  two  closely  related  algorithms.  They  began  by  ob¬ 
serving  that  in  finding  an  eigenvector-eigenvalue  pair  (x„,  A,)  of  a  given 
real  symmetric  matrix  A,  if  we  require  that  ||x||2  =  1,  then  the  pair  must 
satisfy 


(A-XI)x  =  0, 
\0-~xTx)  =  0. 


(1) 


They  then  suggest  the  following  scheme  for  solving  this  system:  let  x% x0  =  1 
for  the  initial  iterate  (xo,  Ao);  now  given  a  current  iterate  (x,  A),  let 
(Ax,  AA)  solve 


f  A- XI  -x  W  A  x  \  (A-  X  I)x  \ 

V  ~xT  0  /  V  AA  /  \  K1"  xT* )  / 


(2) 


and  let 


(x  4-  Ax) 

X+  ~  ||x  +  Ax||’  (3) 

A+  =  A  +  AA. 

Without  the  normalization  in  (3)  this  would  be  Newton’s  method  applied 
to  the  nonlinear  system  (1).  With  the  normalization  it  is  the  Projected 
Newton  Method. 


Motivated  by  the  fact  that  in  practice  it  is  usually  simpler  to  scale 
successive  x-iterates  so  that  a  particular  component  is  equal  to  one,  Peters 
and  Wilkinson  propose  using  a  different  normalization:  instead  of  finding 
a  solution  to  (1),  find  one  to 


( A  —  X  I)x  =  0, 
1  -  e^x  =  0; 


(4) 


where  it  is  assumed  that  the  mth  component  of  x*  is  one  of  its  larger  com¬ 
ponents.  The  analogous  iterative  scheme  for  solving  this  problem  requires 
that  e^xo  =  1,  and  that  (Ax,  A  A)  solve 


f  A  —  XI  —x\f  Ax  \  ~  A/)x  \ 

V  0  V  ) 


(5) 
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Here,  we  take 


(6) 


x+  =  x  +  Ax, 

A+  =  A  +  AA. 

Because  e^(Ai)  =  0,  each  iterate  satisfies  the  new  normalization;  there¬ 
fore,  Projected  Newton  coincides  with  Newton’s  method  on  the  system  (4). 
In  Section  4  of  Peters  and  Wilkinson  (1979),  there  is  an  argument  that  es¬ 
tablishes  that  the  matrix  in  Equation  (5)  is  nonsingular  at  (x„,  A«)  if  A,  is 
a  simple  eigenvalue.  Thus,  when  A,  is  a  simple  eigenvalue,  the  convergence 
of  this  scheme  is  clearly  q-quadratic  in  the  pair  (x,  A),  as  a  consequence  of 
the  standard  theory  for  Newton’s  Method. 

Wilkinson  (1981)  considered  extensions,  refinements  or  applications  of 
the  scheme  (6),  as  did  Dongarra,  Moler  and  Wilkinson  (1983).  The  latter 
paper  also  contains  a  proof  of  r-quadratic  convergence  of  the  x-iterates  for 
this  scheme,  which  follows  immediately  from  the  q-quadratic  convergence 
of  the  pair  (x,  A)  (The  definitions  of  r-  and  q-order  of  convergence  may  be 
found  in  Dennis  and  Schnabel  (1983),  pp.  19-21.  For  further  detail,  see 
Chapter  9  of  Ortega  and  Rheinboldt  (1970)).  In  numerical  experiments  to 
assess  the  behavior  of  the  two  schemes  described,  we  discovered  that  the 
second  did  indeed  seem  to  be  q-quadratically  convergent,  but  no  better;  the 
first,  however,  was  undeniably  faster  than  q-quadratic,  yet  not  q-cubic.  In 
§2  we  prove  our  main  result,  that  the  convergence  of  each  of  the  sequences 
Xk  and  Afc  generated  by  the  first  scheme  is  actually  of  q-order  1  +  \/2.  In 
§3,  we  add  concluding  remarks. 

Before  proceeding  to  the  main  result,  we  note  that  the  two  algorithms 
presented  above  are  closely  related  to  two  other  well-known  methods  for 
finding  an  eigenvalue-eigenvector  pair  for  a  symmetric  matrix  A:  inverse 
iteration  and  Rayleigh  Quotient  Iteration.  To  see  this,  note  that  the  first 
equation  in  either  (2)  or  (5)  implies  that 

(A  -  A/)x+  =  (A  -  A J)(x  +  Ax)  =  (AA)x.  (7) 

Given  a  current  eigenvector  estimate  x  and  eigenvalue  estimate  A  (for  in¬ 
verse  iteration  the  eigenvalue  estimate  is  fixed,  for  the  other  three  it  changes 
from  iteration  to  iteration),  equation  (7)  shows  that  each  of  the  latter  three 
methods  produces  a  new  eigenvector  estimate  that  is  a  scalar  multiple  of 
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the  one  given  by  inverse  iteration.  The  new  eigenvalue  estimates  for  these 
three  methods  are  also  related  to  each  other  in  an  interesting  way.  By  pre¬ 
multiplying  both  sides  of  (2)  by  (xT,  —  A)  and  rearranging  terms,  we  obtain 
the  expression 

A+  =  ^  (8) 

X1  x+ 

for  the  Projected  Newton  formulation.  Similarly,  we  can  obtain  from  (5) 
the  expression 


A+  = 


eTmAx+ 

emX+ 


for  the  Peters- Wilkinson  formulation.  For  the  Rayleigh  Quotient  Iteration, 
the  new  eigenvalue  estimate  is  given  by 


A  -f± 
A+  “ 


T  Ax  j 


T 

X+X+ 


Thus  each  method  has  the  effect  of  solving  ( A  —  A /)x+  =  x,  scaling  appro¬ 
priately,  and  updating  A  using  the  appropriate  formula.  This  shows  that 
our  result  is  of  a  theoretical  nature,  since  it  offers  few  if  any  real  computa¬ 
tional  advantages  over  the  Rayleigh  Quotient  Iteration,  the  convergence  of 
which  is  q-cubic. 


2  Main  result 


The  following  Lemma  gives  a  sufficient  condition  for  a  q-convergence  rate 
of  1  +  y/2. 

Lemma  Let  {a:*}  be  a  sequence  that  converges  to  x*.  If  there  exist 
positive  constants  m,  M,  and  k  such  that 


|]xfe+1-x,|| 

II®*  -  X,||2||®*-1  -  ®*|| 


<  M 


(9) 


for  all  k  >  k,  then  Xk  converges  to  x*  with  q-order  1  +  y/2. 
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Proof.  Clearly,  we  can  assume  k  =  1  without  loss  of  generality.  If  we 
define  ak  by 

_  [[Sfc+l  ~  x,|| 

"k  Ha:*  -  x,||1+v'2’ 

then  we  can  rewrite  (9)  as 

m  <  <  M. 

This  pair  of  inequalities  can  be  applied  repeatedly  to  yield 

<  c2  < 

and  so  on  (as  can  be  shown  by  induction);  in  general, 


ck  <  =  pk. 

Therefore,  lim  sup  a*.  <  lim  /?*.  =  Hence  for  all  sufficiently 

large  k , 

l|Xfc+1~X*^  =  +  1. 

||x*  -  x,||1+,/2 

Therefore  x*,  converges  to  x,  with  q-order  1  +  \/2.0 

From  now  on,  let  us  write  (x_,  A_),  (x,  A),  and  (x+,  A+),  where  con¬ 
venient,  in  place  of  (x^-i,  Aj-.x),  (x^,  A*),  and  (xfc+i,  A^+1).  We  now  state 
our  main  result: 

Theorem  Let  A  be  a  symmetric  matrix,  xo  a  vector  with  ||xo||  =  1,  and 
A0  a  scalar  that  is  not  an  eigenvalue  of  A.  Taking  (x,  A)  =  (x0,  A0)  initially, 
apply  the  following  steps  iteratively  to  generate  a  sequence  ( Xk ,  A*,); 


A  -  XI  -x 
-xT  0 


(A  -  A I)x 
0 


x+ 
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If  Ajt  converges  to  an  eigenvalue  A,  and  A*  ^  A*  for  all  k,  then  At  con¬ 
verges  with  q-order  1  -f-  y/2,  and  x k  converges  with  the  same  q-order  to  a 
corresponding  eigenvector  x». 


Remark.  We  note  that  neither  the  hypothesis  requiring  convergence  of 
At  nor  the  hypothesis  that  A*  ^  A,  should  be  considered  restrictive.  When 
A,  is  a  simple  eigenvalue,  the  Jacobian  matrix  is  nonsingular  at  (x»,  A,), 
so  the  standard  theory  for  Newton’s  method  ensures  that  the  sequence 
is  locally  quadratically  convergent  (at  least)  in  this  case.  When  A*  is  not 
simple,  convergence  does  not  follow  from  the  standard  theory,  but  in  view  of 
this  algorithm’s  resemblance  to  inverse  iteration,  it  is  reasonable  to  assume 
that  Xk  will  converge  to  an  eigenvector,  and  Equation  (8)  then  implies 
that  A k  converges  to  an  eigenvalue.  The  second  hypothesis  excludes  the 
possibility  that  a  finite  number  of  iterations  might  lead  to  an  exact  solution. 
If  Xk  were  an  eigenvector,  then  At+i  would  be  the  corresponding  eigenvalue, 
as  can  be  readily  seen  from  (8).  If  Xk  were  equal  to  an  eigenvalue,  then  the 
iteration  as  described  above  might  not  be  defined,  since  the  matrix  might 
be  singular;  in  any  case,  however,  (Ax,  AA)  =  (x»  —  x,  0)  would  solve  the 
associated  linear  system  (2)  for  some  eigenvector  x*  corresponding  to  the 
eigenvalue  A*.  Thus  an  exact  solution  would  be  found  in  one  more  iteration, 
provided  that  the  iteration  is  defined. 


Proof.  There  are  two  main  parts  to  the  proof  of  the  Theorem:  we  show 
first  that 

,  ^  l|x*+i-x.||  ^ 

h  <  - - rrn~ - —  <  “i  (10) 


and  then  that 


I2  £: 


I Afc  -  A„j||xfc  -  x. 

I  Ajt  -  A»  | 

Xk  -  £*|]  jjXfc-i  -  X. 


<  U2, 


(11) 


for  k  sufficiently  laxge,  where  l\,  iti,  l?,  and  u2  are  positive  constants.  From 
(10)  and  (11),  it  is  straightforward  to  show  that 


hh< 


||  X  k^-X  ®*| _ 

|x*,  -  X„||2||Xfc_i  -  x*|| 


<  UIU2, 


and 


Ijl 2  <  1  Afc+i  -  A«|  <  u\u2 

y-2  ~  | A*  —  A»|2| Ait_i  —  A*|  —  l2 
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We  can  then  apply  the  Lemma  to  conclude  that  both  x*  and  A  *  converge 
with  q-order  1  +  \/2  whenever  (10)  and  (11)  hold. 

For  the  first  part  of  the  proof,  we  parallel  the  proof  of  the  linear  conver¬ 
gence  of  the  power  method  given  by  Parlett  (1980).  We  begin  with  some 
notation:  let  S  be  the  subspace  of  eigenvectors  of  A  with  eigenvalue  A,,  and 
denote  the  orthogonal  complement  of  S  by  S1.  Define  x,  to  be  the  vector 
obtained  by  normalizing  the  projection  of  xo  onto  the  subspace  S.  Denote 
the  angle  between  x  and  x,  by  9,  and  the  angle  between  x+  and  x,  by  9+. 
Note  that  we  can  write  x  as 


x  =  x»  cos  9  +  u  sin  9 , 


(12) 


where  u  is  a  unit  vector  orthogonal  to  x,.  If  we  pre- multiply  both  sides  of 
(12)  by  ( A  —  A/)-1,  we  obtain 

^  =  *•  (£ri)  +  (imIaKii)  ^ 

By  (7),  ( A  —  A 7)-1x  is  a  scalar  multiple  of  x+;  since  x,  is  orthogonal  to 
(A  —  A/)-1u,  the  above  expression  shows  that  an  orthogonal  decomposition 
of  x+  is 

x+  =  x,  cos  9+  +  u+  sin  9+ , 


where 


and 


(A  -  A/)-1u 
||(A-AJ)-iu|| 


tan^+ 


||(i4  —  A  I)  1u||  sin^ 
cos  9/(  A,  —  A) 

(A,  -  A)||(A  -  AJ)-1u||  tan^. 


(13) 


By  the  choice  of  x„,  u0  €  S'1;  hence,  by  Equation  (13)  and  induction,  we 
can  conclude  that  Uk  G  S1  for  all  k.  Thus  we  can  obtain  upper  and  lower 
bounds  on  ||(A  —  AI)~1u||  by  considering  the  restriction  of  ( A  —  A/)-1  to 
Sx  (i.e.  the  reduced  resolvent),  which  we  denote  by  (A  —  A/)-x: 

iKA-AirM  =  iKA-A/r^n 

<  ||(A  —  AJ)_X||  (14) 

1 

I  "a  -  A|  ’ 
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where  u\  is  the  nearest  eigenvalue  of  A  to  A  other  than  A*.  Similarly, 

HM  -  AO-'-ll  >  j^rsi  (i«) 

where  <f>\  is  the  eigenvalue  of  A  farthest  from  A.  So  we  now  have  that 

tan  9 + 

—  tan  9  ~ 

It  is  clear  from  (16)  that  if  A^  converges  to  A,,  then  the  vector  we  have 
defined  to  be  x,  is  indeed  the  eigenvector  to  which  x*  converges. 

In  terms  of  the  error  angle  9,  the  error  in  x  is  given  by 

||x-x.||  =  2  sin  ^  (17) 

=  ^  —  cos#).  (18) 

From  (18),  the  following  relationship  can  be  derived  in  a  straightforward 
manner: 

tan  9 +  cos  9+  (  1  +  cos  9 
tan  9  cos  9  \  1  +  cos  9+ 

Combining  (16)  with  (19)  and  taking  limits,  we  may  conclude  that 


jjXj.  -  x,||  __ 
||x  -  x»|| 


A, -A 
<^A  —  A 


< 

< 


Hence  (10)  holds  with,  for  instance,  l\  =  2|0a1-a7|'  anc^  =  \Vx  -a.|~ 

We  now  show  that  there  exist  positive  I2  and  u2  satisfying  (11).  We  first 
use  (8)  to  write 

xT_{A  -  KI)x 
xLx 
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If  we  substitute  the  orthogonal  decompositions  of  x _  and  x  given  by  (12) 
into  (20),  we  obtain 

x  x  (x»  cos  6-  +  sin#_)T(^l  —  A,/)(x»  cos  &  +  u  sin0) 

A  A*  —  ~  Tp  " 

xtx 

( u _  sin#_)r(A  —  A */)(u  sin#) 
x:Ex  5 


since  (x„,  A*)  is  an  eigenvector-eigenvalue  pair  for  A.  Now  we  use  (17)  to 
relate  the  sines  of  the  error  angles  to  the  sizes  of  the  errors  in  x_  and  x, 
and  we  use  (13)  to  write  u  in  terms  of  u_  and  A_: 


A- A. 

X-  —  X*||||x  —  X*| 


0-  8 
cos  —  cos  | 

T 

xtx 

8-  8 
COS  —  COS  2 

xtx 


(u^(A  —  \_I)u  +  (A_  —  A„)iAit) 


Using  the  bounds  for  ||(A  —  A _/)  1u_||  from  (14)  and  (15),  then  taking 
limits,  it  follows  that 


1 


I^A.  -  A, 


<  lim  inf 


|Afc  -  A. 


<  lim  sup 


\\Xk  ~  ~  X.| 

|A*-A.| 


\\xk  -  x„||||xjt_i  -  X, 


< 


1 


Wx.  -  A,  | 


Hence  (11)  holds  with,  for  example,  l2  =  2^1_x<|  and  u2  =  2-\,\-  This 

completes  the  proof.  □ 


3  Conclusion 

The  motivation  for  this  work  was  the  intriguing  and,  to  us,  surprising  non¬ 
integral  superquadratic  convergence  rate  indicated  in  our  numerical  exper¬ 
iments.  We  were  also  quite  pleased  that  we  were  able  to  establish  a  q-rate 
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of  convergence  of  1  +  \/2  both  for  the  x-iterate  alone  and  for  the  A-iterate 
alone.  The  classical  Newton’s  method  theory  gives  q-quadratic  convergence 
in  the  pair  (x,  A)  for  the  Peters- Wilkinson  algorithm,  which  implies  an  r- 
quadratic  rate  in  each  of  x  and  A.  The  direct  proof  given  by  Dongarra, 
Moler  and  Wilkinson  (1983)  also  only  establishes  r-quadratic  convergence 
in  x  for  the  same  algorithm. 

We  experimented  with  several  problem  formulations  that  used  p-norms 
other  than  the  2-norm  and  oo-norm;  for  no  values  of  p  except  p  —  2  did  our 
estimated  convergence  rates  consistently  exceed  2.  On  the  basis  of  these 
experiments,  we  consider  it  extremely  unlikely  that  any  p-norm  formulation 
other  than  the  one  presented  here  produces  superquadratic  convergence. 
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